View By Category

Student Challenge
Student Challenge

Student Challenge Presentations

Presentation 01: mICKEY: An Integrated Pipeline for Biomarker Discovery and Deep Neural Network Models for Predicting Cancer Origin from DNA Methylation Data

Kasidech Aewsrisakul,
Natthawadee Leephatarakit,
Pakanan Tussanapirom,

Short Abstract: "Treatment of cancer is generally determined by the type of tissue origin. However, up to 5% of all cancer cases have carcinoma of unknown primary (CUP). This type of metastatic cancer poses additional challenges to the identification of primary cancer sites and to successful treatment. Carcinogenesis is associated with extensive DNA methylation abnormality on 5'-cytosine-phosphate-guanine-3' (CpG) islands across the genome. Several of these epigenetic modifications take place early in carcinogenesis and are widespread across tumor types. This enables DNA methylation to be cancer biomarkers for early diagnosis, prediction of cancer tissue origin, and potential optimization of treatment. Ultimately, we propose a pipeline for 1) methylation biomarker discovery, 2) prediction of cancer tissue origin from methylation profile using a deep learning model in conjunction with an end-to-end genetic algorithm, and potentially identify the methylation signatures of each primary site from the model, and 3) development of an open-source application to allow general users who lack extensive knowledge in machine learning to analyze their data efficiently and effortlessly. **Note : This is a team project by Pakanan Tussanapirom (19525133), Natthawadee Leephatarakit (20837888) and Kasidech Aewsrisakul (19525168)**"

Your browser does not support iframe videos. <a href="https://www.youtube.com/embed/d2G89X8d-Ws" target="_blank">Click here to open video in a new tab</a>

Presentation 02: AI/ML Based Ovarify Application for an Early Recommendation of screening for Ovarian Cancer

Riya Davar,

Short Abstract: According to the CDC, in the United States, Ovarian Cancer is the second most prevalent form of gynecologic cancer and is the fifth leading cause of mortality in women. The only reliable method to screen for this cancer is TVS (trans-vaginal sonography), which is both invasive and costly. The goal of this project was to use the mRMR (Maximum Relevance Minimum Redundancy) Feature Selection Algorithm to select a panel of biomarkers from the Ovarian Cancer dataset and more importantly, create a non-invasive and inexpensive software tool that could help validate the panel and assist with the early detection of Ovarian Cancer, with a reasonable level of sensitivity. This simple software tool is just a starting point to prototype how a basic blood test can be used to generate a health profile of an individual based on historical datasets, providing for personalized treatments and better patient outcomes. This project uses an ovarian cancer dataset with 49 features. The mRMR filter method of feature selection eliminates the redundant features while keeping the relevant features that impact the target class. This project accomplished the final goal of creating a working web application that asks a clinician to provide a few basic blood test results and generates a recommendation for further screening. The machine learning model used by the application is the Random Forest Machine Learning model which is created with the K best features picked by the mRMR algorithm and is successfully utilized to predict the likelihood of the disease and treatment targets thus helping with reducing the mortality rate from ovarian cancer. This project used the Random Forest Classifier model machine learning model. It has been shown to work well with smaller datasets (as with this project’s dataset) and had a sensitivity score of 0.96.

Your browser does not support iframe videos. <a href="https://www.youtube.com/embed/9CJWw9MzvUQ" target="_blank">Click here to open video in a new tab</a>

Presentation 03: Treating Mental Disorders with Technology

Sahit Jayaweera,

Short Abstract: In my project, I plan to analyze the similarities between the human brain and computers to develop a computerized method of diagnosing and treating mental disorders more accurately and efficiently. In a way, the brain functions like a computer by taking in information from the world and using it to generate an output that is used to control the human body. The main difference between computers and human brains is that computers compute using a large collection of transistors that can be either on or off. Computers use this to look at inputs as a combination of ons and offs, or ones and twos, and give an output in the same form. The human brain is different because it has been shown that individual neurons are able to choose an output by themselves using some unknown method, without a large combination of neurons. Because of this, the human brain differs from computers. If we can use this knowledge to build a computer that can simulate the human brain, we can run experiments using this model instead of with real patients. This can help us find reasons for mental disorders and ways to treat these disorders while avoiding real-life complications and ethical issues.

Video not uploaded

Presentation 04: Behind the Diagnosis: Discovering Biomarkers in Blood Samples For Early Alzheimer’s Diagnosis Using Deep learning algorithms

Grace Ko,
Paul Hwang,

Short Abstract: Presenters: Grace Ko and Paul Hwang Alzheimer’s Disease (AD) is a neurodegenerative disease marked by loss of memory and cognitive functions. As there is no Alzheimer’s cure, an essential early diagnosis can lead to earlier treatment access and time to prepare financially. Current diagnosis methods are expensive and hard to access, which is why there has been a shift to use blood tests. The problem with blood-based diagnosis is their use of biomarkers as there are only a few known biomarkers of Alzheimer’s. We used an Adversarial Autoencoder (AAE) to increase sensitivity in identifying potential biomarkers in blood samples. From the experiment, we succeeded in identifying potential biomarkers for Alzheimer’s using Deep learning algorithm. Not only are these biomarkers related to the immune response system, the molecular mechanisms of these genes are enriched in Alzheimer’s. This enrichment further indicates the potential of these genes to be useful biomarkers in early diagnosis.

Your browser does not support iframe videos. <a href="https://www.youtube.com/embed/125d21TjhRY" target="_blank">Click here to open video in a new tab</a>

Presentation 05: Developing and Evaluating Methods of Generating and Applying Polygenic Scores in a Diverse Cohort

Kien Lau,

Short Abstract: Genome-wide association studies (GWAS) have enabled researchers to capture the effect of genetic variants associated with complex diseases into a single polygenic score (PS). Diabetes PS has been previously shown to be an accurate metric for an individual's genetic risk for developing diabetes, but there is limited literature on how these scores should be applied in a complicated, multi-ancestry cohort similar to real-world U.S. based healthcare systems. Additionally, due to historical exclusions of non-Europeans in genetic analyses, varying prediction accuracies of PS may exacerbate current health disparities. To alleviate the misrepresentation of GWAS data, we investigate novel computational approaches to develop and evaluate methods for generating transferable PS models to most accurately predict the prevalence of type 2 diabetes within the diverse, multi-ancestry Massachusetts General Brigham Biobank cohort. In this presentation, we propose potential future research directions to more accurately generate PS by considering rare variants in our models and incorporating more ancestry-identifying priors. We also raise broader questions about the societal impacts of PS implementation.

Your browser does not support iframe videos. <a href="https://www.youtube.com/embed/GU9BnsuRbBo" target="_blank">Click here to open video in a new tab</a>

Presentation 06: Knowledge transfer across multi-omics modalities using a novel fine-tuning strategy for comprehensive understanding of biological phenomena

Lakshmi Sritan Motati,

Short Abstract: Multi-omics approaches are critical to obtain an understanding of biological systems in entirety. Recent advancements in multi-omics, particularly in single-cell technologies, have led to the joint profiling of several omics data simultaneously from the same biological samples. Further, it has facilitated the development of large-scale heterogeneous single-cell datasets. However, current sequencing technologies have limitations that deter their ability to collect more than two modalities at a time. Thus, it is challenging to link information between different modalities, especially when data from the distinct modalities are independently collected. In the following study, we implemented an encoder-decoder neural network that can translate information from one modality (chromatin accessibility or gene expression data) to their corresponding translation products (gene expression or protein expression, respectively). While high performances were obtained with neural networks when translating from gene expression to protein expression, the model did not perform well when translating from chromatin accessibility to gene expression. To improve the predictive power of the model, we postulated that knowledge acquired in the auxiliary translation task (gene expression to protein expression) could be reused to improve the model performance on the primary translation task (chromatin accessibility to gene expression). To do this, we used a two-step learning strategy: an encoder-decoder neural network that was first trained to carry out the auxiliary translation, and then fine-tuned to perform the primary translation. The input and the output layers of the model were modified to match with the dimensions of the primary translation’s modalities. Layer weights learned from the auxiliary task helped to transfer information between the single-cell modalities and improve model performance on the primary translation. We further evaluated our model performance on the NeurIPS 2021 Single-Cell Analysis Competition dataset. Our model’s performances on the test set were of the same order as baseline deep learning models with an equal Pearson’s correlation value. Additionally, our approach led to a 33.2% decrease in normalized root mean square error for the translation of chromatin accessibility to gene expression fine-tuned using the weights from the translation of gene expression to protein expression. The implications of this study are twofold: it reaffirms the use of deep learning for single-cell multi-omics while also evaluating a novel method for linking information across discrete modalities. Further, it paves the way for state-of-the-art methods for making biological inferences from multimodal data and its underlying complexity.

Your browser does not support iframe videos. <a href="https://www.youtube.com/embed/_yBX5ZKJU9Y" target="_blank">Click here to open video in a new tab</a>

Presentation 07: Rapid and Automated Detection of Cancer and Immune Cells Using Novel Machine Learning Recognition Algorithms

Nesara Shree,

Short Abstract: The human body has over 200 different types of cells, each of which has a distinct function. Cures for many illnesses, such as cancer and neurological/cardiovascular diseases, involve killing specific dysfunctional cell groups. In the field of cancer research and immunotherapy treatment, scientists must identify harmful cancer cells and healthy immune cells in a patient sample. Current identification methods are inefficient because a researcher must manually set conditions, or a ‘threshold value’, to identify each cell type independently. Implementing machine learning image-recognition technologies enables researchers to identify cancer and immune cells faster than with the existing approach because human intervention is not required. We are using Faster R-CNN implemented by Keras Tensorflow, one of the best one-stage object detection models proven to work well with dense, small-scale objects such as cells. The model learns to recognize patterns in the data using JPEG images, CSV renditions, and XML annotations to train. Once training is completed and the accuracy is calculated, adjustments are made to the model or dataset. This training cycle repeats until the model achieves the highest possible accuracy. The subsequent validation process confirms that the model’s inferences regarding the cell type are accurate. This engineering project shows that our model has the capacity to identify cancer and immune cells significantly faster than the current approach. Further experimentation includes not only cancer and immune cell identification, but also differentiating between the cell types within one data sample. Applying this technology will allow researchers to improve upon current immunotherapy treatments.

Video not uploaded

Virtual Viewing Hall

View By Category

Student Challenge

Student Challenge Presentations